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Why nitpicking works: evidence for Occam's Razor in error correctors Q 

Dekal Wu, Grace Ngal, Marine Carpuat 

August 2004 Proceedings of the 20th international conference on Computational 

Linguistics COLING '04 
Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(283.34 KB) Additional Information: full citation , abstract , references 

Empirical experience and observations have shown us when powerful and highly tunable 
classifiers such as maximum entropy classifiers, boosting and SVMs are applied to 
language processing tasks, it is possible to achieve high accuracies, but eventually their 
performances all tend to plateau out at around the same point. To further improve 
performance, various error correction mechanisms have been developed, but In practice, 
most of them cannot be relied on to predictably improve performance on un ... 

42 Verb phrase ellipsis detection using automatically parsed text Q 
Leif Arda Nielsen 

August 2004 Proceedings of the 20th international conference on Computational 
Linguistics COLING '04 

Publisher: Association for Computational Linguistics 

Full text available: ' gpdfd 70.05 KB) Additional Information: full citation , abstract , references 

This paper describes a Verb Phrase Ellipsis (VPE) detection system, built for robustness, 
accuracy and domain independence. The system is corpus-based, and uses a variety of 
machine learning techniques on free text that has been automatically parsed using two 
different parsers. Tested on a mixed corpus comprising a range of genres, the system 
achieves a 72% Fl-score. It is designed as the first stage of a complete VPE resolution 
system that is input free text, detects VPEs, and proceeds to find ... 

*3 Linguistically informed statistical models of constituent structure for ordering in Q 
^ sentence realization 

Eric Ringger, Michael Gamon, Robert C. Moore, David Rojas, Martine Smets, Simon Corston- 
Oliver 

August 2004 Proceedings of the 20th international conference on Computational 
Linguistics COLING '04 

Publisher: Association for Computational Linguistics 

Full text available: ^pdfd 36.95 KB) Additional Information: full citation , abstract, references 
We present several statistical models of syntactic constituent order for sentence 
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realization. We compare severai models, including simple joint models inspired by existing 
statistical parsing models, and several novel conditional models. The conditional models 
leverage a large set of linguistic features without manual feature selection. We apply and 
evaluate the models in sentence realization for French and German and find that a 
particular conditional model outperforms all others. We employ a ... 

44 High-performance tagging on medical texts Q 
Udo Hahn, Joachim Wermter 

August 2004 Proceedings of the 20th international conference on Computational 
Linguistics COLING '04 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(96.19 KB) Additional Information: full citation , abstract, references 

We ran both Brill's rule-based tagger and TNT, a statistical tagger, with a default German 
newspaper-language model on a medical text corpus. Supplied with limited lexicon 
resources, TNT outperforms the Brill tagger with state-of-the-art performance figures 
(close to 97% accuracy). We then trained TNT on a large annotated medical text corpus, 
with a slightly extended tagset that captures certain medical language particularities, and 
achieved 98% tagging accuracy. Hence, statistical off-the-shelf ..; 

*5 Acquiring the meaning of discourse markers Q 
Ben Hutchinson 

July 2004 Proceedings of the 42nd Annual Meeting on Association for Computational 

Linguistics ACL '04 
Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(95.33 KB) Additional Information: full citation, abstract , references 

This paper applies machine learning techniques to acquiring aspects of the meaning of 
discourse markers. Three subtasks of acquiring the meaning of a discourse marker are 
considered: learning Its polarity, veridicality, and type (i.e. causal, temporal or 
additive). Accuracy of over 90% is achieved for all three tasks, well above the baselines. 

46 Incremental parsing with the perceptron algorithm Q 
Michael Collins, Brian Roark 

July 2004 Proceedings of the 42nd Annual Meeting on Association for Computational 
Linguistics ACL '04 

Publisher: Association for Computational Linguistics 

Full text available: ^pdf(141.53 KB) Additional lnformation: full citation , abstract, references 

This paper describes an incremental parsing approach where parameters are estimated 
using a variant of the perceptron algorithm. A beam-search algorithm is used during both 
training and decoding phases of the method. The perceptron approach was implemented 
with the same feature set as that of an existing generative model (Roark, 2001a), and 
experimental results show that it gives competitive performance to the generative model 
on parsing the Penn treebank. We demonstrate that training a perceptr ... 

47 Discriminative language modeling with conditional randonn fields and the perceptron Q 
algorithm 

Brian Roark, Murat Saraclar, Michael Collins, Mark Johnson 

July 2004 Proceedings of the 42nd Annual Meeting on Association for Computational 
Linguistics ACL '04 

Publisher: Association for Computational Linguistics 

Full text available: ^pdf(219.59 KB) Additional Information: full citation , abstract , references 

This paper describes discriminative language modeling for a large vocabulary speech 
recognition task. We contrast two parameter estimation methods: the perceptron 
algorithm, and a method based on conditional random fields (CRFs). The models are 
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encoded as deterministic weighted finite state automata, and are applied by intersecting 
the automata with word-lattices that are the output from a baseline recognizer. The 
perceptron algorithm has the benefit of automatically selecting a relatively small ... 

48 A maximum entropy approach to species distribution modeling 
Steven J. Phillips, Miroslav Dudik, Robert E. Schapire 

July 2004 Proceedings of the twenty-first international conference on Machine 
learning ICML '04 

Publisher: ACM Press 

Full text available: ^pdf(163.78 KB) Additional Information: full citation , abstract , references , citings 

We study the problem of nnodeling species geographic distributions, a critical problem in 
conservation biology. We propose the use of maximum-entropy techniques for this 
problem, specifically, sequential-update algorithms that can handle a very large number 
of features. We describe experiments comparing maxent with a standard distribution- 
modeling tool, called GARP, on a dataset containing observation data for North American 
breeding birds. We also study how well maxent performs as a function of ... 

49 Head-Driven Statistical Models for Natural Language Parsing 

Michael Collins 

December 2003 Computational Linguistics, volume 29 issue 4 
Publisher: MIT Press 

Full text available: fa pdf(633.30 KB) Additional Information: full citation , ibstract. references , dtiogs. index 
^ terms 

This article describes three statistical nnodels for natural language parsing. The models 
extend methods from probabilistic context-free grammars to lexicallzed grammars, 
leading to approaches in which a parse tree is represented as the sequence of decisions 
corresponding to a head-centered, top-down derivation of the tree. Independence 
assumptions then lead to parameters that encode the X-bar schema, sut^categprization, 
ordering of complements, placement of adjuncts, bigram lexical dependencies, ... 

50 Knowledge management session 2: semantic web: Learning cross-document 

structural relationships using boosting 
Zhu Zhang, Jahna Otterbacher, Dragomir Radev 

November 2003 Proceedings of the twelfth international conference on Information 
and knowledge management CIKM '03 

Publisher: ACM Press 

Full text available: ^ pdf(145.37 KB) Additional Information: full citation , abstract , references , index terms 

Multi-document discoure analysis has emerged with the potential of improving various 
Information retrieval applications. Based on the newly proposed Cross-document 
Structure Theory (CST), this paper describes an empirical study that uses boosting to 
classify CST relationships between sentence pairs extracted from topically related 
documents. We show that the binary classifier for determining existence of structural 
relationships significantly outperforms the baseline. We also achieve promising r ... 

Keywords: boosting, classification, cross-document structure, discourse analysis 
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Multimedia MULTIMEDIA '03 
Publisher: ACM Press 
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Recent interest in the area of music information retrieval and related technologies is 
exploding. However, very few of the existing techniques take advantage of recent 
developments in statistical modeling. In this paper we discuss an application of Random 
Fields to the problem of creating accurate yet flexible statistical models of polyphonic 
music. With such models in hand, the challenges of developing effective searching, 
browsing and organization techniques for the growing bodies of music col ... 

52 Embedding v^eb-based statistical translation models in cross-language information Q 
retrieval 

Wessel Kraaij, Jian-Yun Nie, Michel SImard 

September 2003 Computational Linguistics, Volume 29 issue 3 

Publisher: MIT Press 

Full text available: aDdfrSSLaQ KB) Additional Information: full citation , abstract, references , dtings. index 

terms 

Although more and more language pairs are covered by machine translation (MT) 
services, there are still many pairs that lack translation resources. Cross-language 
Information retrieval (CLIR) Is an application that needs translation functionality of a 
relatively low level of sophistication, since current models for information retrieval (IR) 
are still based on a bag of words. The Web provides a vast resource for the automatic 
construction of parallel corpora that can be used to train statistical ... 

Posters: Music modeling with random fields Q 
Victor Lavrenko, Jeremy Pickens 

July 2003 Proceedings of the 26th annual international ACM SIGIR conference on 
Research and development in informaion retrieval SIGIR '03 

Publisher: ACM Press 

Full text available: ^ pdf(98.04 KB) Additional Information: full citation , references, index terms 



5* A fast algorithm for feature selection in conditional maximum entropy modeling 
Yaqian Zliou, Lide Wu, Fuliang Weng, Hauke Schmidt 

July 2003 Proceedings of the 2003 conference on Empirical methods in natural 

language processing - Volume 10 
Publisher: Association for Computational Linguistics 

Full text available: pdff267.62 KB) Additional Infonnation: full citation, abstract , references , citings 

This paper describes a fast algorithm that selects features for conditional maximum 
entropy modeling. Berger et al. (1996) presents an incremental feature selection (IFS) 
algorithm, which computes the approximate gains for all candidate features at each 
selection stage, and is very time-consuming for any problems with large feature spaces. 
In. this new algorithm, instead, we only compute the approximate gains for the top-ranked 
features based on the models obtained from previous stages. Experimen ... 

55 Single character Chinese nanrted entity recognition 
Xiaodan Zhu, Mu Li, Jianfeng Gao, Chang-Ning Huang 

July 2003 Proceedings of the second SIGHAN workshop on Chinese language 

processing - Volume 17 
Publisher: Association for Computational Linguistics 

Full text available: gpdfd 76.35 KB) Additional Information: full citation, abstract , references , citings 

Single character nanned entity (SCNE) is a name entity (NE) composed of one Chinese 
character, such as "[Abstract contained text which could not be captured.]" {zhongl, 
China) and "[Abstract contained text which could not be captured.]" (e2, Russia). SCNE is 
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very common in written Clilnese text. However, due to the lack of in-depth research, 
SCNE is a major source of errors in named entity recognition (NER). This paper 
formulates the SCNE recognition within the source-channel model f ... 

56 Identification of patients with congestive heart failure using a binary classifier: a case HI 
study 

Serguei V. Pakhomov, James Buntrock, Christopher G. Chute 

July 2003 Proceedings of the ACL 2003 workshop on Natural language processing in 

biomediclne - Volume 13 
Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(97.06 KB) Additional Information: full citation, abstract , references 

This paper addresses a very specific problem that happens to be common in health 
science research. We present a machine learning based method for identifying patients 
' diagnosed with congestive heart failure and other related conditions by automatically 
classifying clinical notes. This method relies on a Perceptron neural network classifier 
trained on comparable amounts of positive and negative samples of clinical notes 
previously categorized by human experts. The documents are represented as fea ... 



57 Closing the gap: learning-based information extraction rivaling knowledge- 
engineering methods 

Hal Leong Chieu, Hwee Tou Ng, Yoong Keok Lee 

July 2003 Proceedings of the 41st Annual Meeting on Association for Computational 

Linguistics - Volume 1 ACL '03 
Publisher: Association for Computational Lihguistics 

Full text available: ^pdf(1 15.36 KB) Additional Information: full citation, abstract , references , citings 

In this paper, we present a learning approach to the scenario template task of information 
extraction, where information filling one template could come from multiple sentences. 
When tested on the MUC-4 task, our learning approach achieves accuracy competitive to 
the best of the I^UC-4 systems, which were all built with manually engineered rules. Our 
analysis reveals that our use of full parsing and state-of-the-art learning algorithms have 
contributed to the good performance. To our knowledge, t ... 

Confidence estimation for translation prediction 
indrabur, George Foster 
2003^ Proceedings of the seventh conference on Natural language learning at 
HLT-NAACL 2003 - Volume 4 

Publisher: Association for Computational Linguistics 

Full text available: ^pdf(342.14 KB) Additional Information: full citation , abstract, references 

The purpose of this work is to investigate the use of machine learning approaches for 
confidence estimation within a statistical machine translation application. Specifically, we 
attempt to learn probabilities of correctness for various model predictions, based on the 
native probabllites (i.e. the probabilltes given by the original model) and on features of 
the current context. Our experiments were conducted using three original translation 
models and two types of neural nets (single-layer and m ... 




59 Inducing histon/ representations for broad coverage statistical parsing 
James Henderson 

May 2003 Proceedings of the 2003 Conference of the North American Chapter of the 
Association for Computational Linguistics on Human Language Technology 
- Volume 1 NAACL '03 

Publisher: Association for Computational Linguistics 

Full text available: ^pdfd 29.57 KB) Additional Infomiation: full citation , abstract , references , citings 

We present a neural network nnethod for inducing representations of parse histories and 



http://portal,acm.org/resultsxfni?query=%2B%221anguage%20processing%22%20%2B% 2/26/2007 



Results (page 3): +"language processing" +"maxiinum entropy" +feature +gain +rank* Page 6 of 6 



using these history representations to estimate the probabilities needed by a statistical 
left-corner parser. The resulting statistical parser achieves performance (89.1% F- 
measure) on the Penn Treebank which is only 0.6% below the best current parser for this 
task, despite using a smaller vocabulary size and less prior linguistic knowledge. Crucial to 
this success is the use of structurally determined soft bi ... 

®° In question answering, two heads are better than one 

Jennifer Chu-Carroll, Krzysztof Czuba, John Prager, Abraham Ittychehah 

IVlay 2003 Proceedings of the 2003 Conference of the North American Chapter of the 

Association for Computational Linguistics on Human Language Technology 

- Volume 1 NAACL '03 
Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(1 02.68 KB) Additional Information: full citation, abstract, references , citings 

Motivated by the success of ensemble methods In machine learning and other areas of 
natural language processing, we developed a multi-strategy and multi-source approach to 
question answering which is based on combining the results from different answering 
agents searching for answers in multiple corpora. The answering agents adopt 
fundamentally different strategies, one utilizing primarily knowledge-based mechanisms 
and the other adopting statistical techniques. We present our multi-level answer ... 
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61 Supervised and unsupervised PCFG adaptation to novel domains 
Brian Roark, Michiel Bacchiani 

May 2003 Proceedings of the 2003 Conference of the North American Chapter of the 
Association for Computational Linguistics on Human Language Technology 
- Volume 1 NAACL '03 

Publisher: Association for Computational Linguistics 

Full text available: ^pdf (139.23 KB) Additional Infomnation: full citation , abstract , references 

This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) 
to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is 
general enough to Include isome previous model adaptation approaches, such as corpus 
mixing in Gildea (2001), for example. Other approaches falling within this framework are 
more effective. In contrast to the results in Gildea (2001), we show F-measure parsing 
accuracy gains of as much as 2.5% for high accuracy lexical ... 

62 Challenges in information retrieval and language modeling: report of a workshop held 
at the center for intelligent information retrieval. University of Massachusetts 
Amherst, September 2002 

James Allan, Jay Aslam, Nicholas Belkin, Chris Buckley, Jamie Callan, Bruce Croft, Sue 
Dumais, Norbert Fuhr, Donna Harman, David J. Harper, Djoerd Hiemstra, Thomas Hofmann, 
Eduard Hovy, Wessel Kraaij, John Lafferty, Victor Lavrenko, David Lewis, Liz Liddy, R. 
Manmatha, Andrew McCallunn, Jay Ponte, John Prager, Dragomir Radev, Philip Resnik, 
Stephen Robertson, Roni Rosenfeld, Salim Roukos, Mark Sanderson, Rich Schwartz, Amit 
Singhal, Alan Smeaton, Howard Turtle, Ellen Voorhees, Ralph Weischedel, Jinx! Xu, 
ChengXiang Zhai 

April 2003 ACM SIGIR Forum, Volume 37 Issue 1 
Publisher: ACM Press 

Full text available: 'g pdfd.SOMB) Additional Infonnation: full citation , citings , index terms, review 
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March 2003 The Journal of Machine Learning Research, Volume 3 
Publisher: MIT Press 

Full text available: Q pdf(266.18 KB) Additional Information: full citation , abstract , citings , index terms 
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Dimensionality reduction of empirical co-occurrence data is a fundamental problem in 
unsupervised learning. It is also a well studied problem in statistics l<nown as the analysis 
. of cross-classified data. One principled approach to this problem is to represent the data 
in low dimension with minimal loss of (mutual) information contained in the original data. 
In this paper we introduce an information theoretic nonlinear method for finding such a 
most Informative dimension reduction. In contrast wi ... 

64 Automatic labeling of semantic roles | 
Daniel Glided, Daniel Jurafsky 

September 2002 Computational Linguistics, volume 28 issue 3 
Publisher: MIT Press . 

Full text available: illDdff573.51 KB) Additional Information: full citation , abstract, references , ciMngs. index 
^ terms 

We present a system for identifying the semantic relationsliips, or semantic roles, fiiied 
by constituents of a sentence witfiin a semantic frame. Given an input sentence and a 
target word and frame, ttie system labels constituents with either abstract semantic roles, 
such as AGENT or PATIENT, or more domain-specific semantic roles, such as SPEAKER, 
MESSAGE, and TOPIC. The system is based on statistical classifiers trained on roughly 
50,000 sentences that were h ... 

65 The disambiguation of nominalizations | 

Maria Lapata 

September 2002 Computational Linguistics, volume 28 issue 3 
Publisher: MIT Press 

Full text available' Si pdf(471 69 KB) ^^^'^'^i^^' Information: full citation , abstract , references , citings , index 

. lisH^—^ = terms 

This article addresses the interpretation of nominalizations, a particular class of connpound 
nouns whose head noun Is derived from a verb and whose modifier is interpreted as an 
argument of this verb. Any attempt to automatically interpret nominalizations needs to 
take into account: (a) the selectional constraints imposed by the nominalized compound 
head, (b) the fact that the relation of the modifier and the head noun can be ambiguous, 
and (c) the fact that these constraints can be easily overr ... 

86 Combining classifiers for Chinese word segmentation | 
Nianwen Xue, Susan P. Converse 

September 2002 Proceeding of the first SIGHAN woricshop on Chinese language 
processing - Volume 18 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(96.17 KB) Additional Information: full citation , abstract , references , citings 

In this paper we report results of a supervised machine-learning approach to Chinese 
word segmentation. First, a maximum entropy tagger is trained on manually annotated 
data to automatically labels the characters with tags that indicate the position of character 
within a word. An error-driven transformation-based tagger is then trained to clean up the 
tagging inconsistencies of the first tagger. The tagged output is then converted into 
segmented text. The preliminary results show that this appro ... 

67 Extracting the unextractable: a case study on verb-particles | 
Timothy Baldwin, Aline Villavicencio 

August 2002 proceeding of tlie 6tli conference on Natural language learning - Volume 
20 COLING-02 

Publisher: Association for Computational Linguistics 

Full text available: ^pdfd 73.70 KB) Additional Information: full citation, abstract, references , citings 
This paper proposes a series of techniques for extracting English verb-particle 
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constructions from raw text corpora. We initially propose three basic methods, based on 
tagger output, chunl<er output and a chunk grammar, respectively, with the chunk 
grammar method optionally combining with an attachment resolution module to 
determine the syntactic structure of verb— preposition pairs in ambiguous constructs. We 
then combine the three methods together into a single classifier, and add in a number ... 

68 Combining heterogeneous classifiers for word-sense disambiguation 

Dan Klein, Kristlna Toutanova, H. Tolga Ilhan, Sepandar D. Kamvar, Christopher D. Manning 
July 2002 Proceedings of the ACL-02 workshop on Word sense disambiguation: 
recent successes and future directions - Volume 8 

Publisher: Association for Computational Linguistics 

Full text available: ^pdfd 57.04 KB) Additional Information: full citation, abstract , references 

This paper discusses ensembles of simple but heterogeneous classifiers for word-sense 
disambiguation, examining the Stanford-CS224N system entered in the SENSEVAL-2 
English lexical sample task. First-order classifiers are combined by a second-order 
classifier,. which variously uses majority voting, weighted voting, or a maximum entropy 
model. While individual first-order classifiers perform comparably to middle-scoring 
teams' systems, the combination achieves high performance. We discuss traderof ... 

69 Discriminative training nnethods for hidden Markov models: theory and experiments 
with perceptron algorithms 

Michael Collins 

July 2002 Proceedings of the ACL-02 conference on Empirical methods in natural 
language processing - Volume 10 EMNLP '02 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(295.Q8 KB) Additional Information: full citation , abstract , references , citings 

We describe new algorithms for training tagging models, as an alternative to maximum- 
entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi 
decoding of training examples, combined with simple additive updates. We describe 
theory justifying the algorithms through a modification of the proof of convergence of the 
perceptron algorithm for classification problems. We give experimental results on part-of- 
speech tagging and base noun phrase chunking, in both cases showing I ... 

70 A hybrid approach to natural language web search 
Jennifer Chu-Carroll, John Prager, Yael Ravin, Christian Cesar 

July 2002 Proceedings of the ACL-02 conference on Empirical methods in naturai 
language processing - Volume 10 EMNLP '02 

Publisher: Association for Computational Linguistics 

Full text available: ^pdfd 18.21 KB) Additional Information: full citation, abstract , references, citings 

We describe a hybrid approach to improving search performance by providing a natural 
language front end to a traditional keyword-based search engine. The key component of 
the system is iterative query formulation and retrieval, in which one or more queries are 
automatically formulated from the user's question, issued to the search engine, and the 
results accumulated to form the hit list. New queries are generated by relaxing 
previously-issued queries using transformation rules, applied in an ord ... 

Machine learnina in automated text categorization 
Fabrizlo Sebastiani 

March 2002 ACM Computing Surveys (CSUR), Volume 34 issue i 
Publisher: ACM Press 

Full text available: Wi pdf(524 41 KB) Additional Information: full citation , abstract , references, citings , index 
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The automated categorization (or classification) of texts into predefined categories has 
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witnessed a booming Interest in the last 10 years, due to the increased availability of 
documents In digital form and the ensuing need to organize them. In the research 
community the dominant approach to this problem Is based on machine learning 
techniques: a general inductive process automatically builds a classifier by learning, from 
a set of preclassifled documents, the characteristics of the categories. ... 

Keywords: Machine learning, text categorization, text classification 



72 Automatic verb classification based on statistical distributions of argument structure 
Paola Merlo, Suzanne Stevenson 

September 2001 Computational Linguistics, Volume 27 issue 3 
Publisher: iVIlT Press 

Full text available: ' PDdf(341.42 KB) 

S ^ Additional Information: full citation , abstract , references , citings 

Pubiisher Site 

Automatic acquisition of lexical knowledge is critical to a wide range of natural language 
processing tasks. Especially important, is knowledge about verbs, which are the primary 
source of relational information in a sentence™ the predicate-argument structure that 
relates an action or state to its participants (i.e., who did what to whom). In this work, we 
report on supervised learning experiments to automatically classify three major types of 
English verbs, based on their argument structure—sp ... 

73 XML-based data preparation for robust deep parsing 
Claire Grover, Alex Lascarldes 

July 2001 Proceedings of the 39th Annual Meeting on Association for Computational 
Linguistics ACL '01 

Publisher: Association for Computational Linguistics 

Full text available: ^pdf(87.15 KB) Additional Information: full citation, abstract, references, citings 

We describe the use of XML tokenlsatlon, tagging and mark-up tools to prepare a corpus 
for parsing. Our techniques are generally applicable but here we focus on parsing Medline 
abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack 
coverage but many coverage failures are due to inadequacies of their lexicons. We 
describe a method of gaining a degree of robustness by interfacing POS tag Information 
with the existing lexicon. We also show that XML tools provide a soph ... 

74 What is the minimal set of fragments that achieves maximal parse accuracy? 

Rens Bod 

July 2001 Proceedings of the 39th Annual Meeting on Association for Computational 

Linguistics ACL '01 
Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(69.87 KB) Additional Information: full citation , abstract , references , citings 

We aim at finding the minimal set of fragments which achieves maximal parse accuracy in 
Data Oriented Parsing. Experiments with the Penn Wall Street Journal treebank show that 
counts of almost arbitrary fragments within parse trees are important, leading to 
improved parse accuracy over previous models tested on this treebank (a precision of 
90.8% and a recall of 90.6%). We Isolate some dependency relations which previous 
models neglect but which contribute to higher parse accuracy. 

75 Beyond standard CFG parsing: New ranking algorithms for parsing and taaainq: 
kernels over discrete structures, and the voted perceptron 

Michael Collins, Nigel Duffy 

July 2001 Proceedings of the 40th Annual Meeting on Association for Computational 
Linguistics ACL '02 
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Publisher: Association for Computational Linguistics 

Full text available: ^pdfn37.02 KB) Additional Infomiation: full citation , abstract, references , citings 

This paper introduces new learning algorithms for natural language processing based on 
the perceptron algorithm. We show how the algorithms can be efficiently applied to 
exponential sized representations of parse trees, such as the "all subtrees" (DOR) 
representation described by (Bod 1998), or a representation tracking all sub-fragments of 
a tagged sentence. We give experimental results showing significant improvements on 
two tasks: parsing Wall Street Journal text, and named-entity extraction ... 

76 The form is the substance: classification of genres in text 
Nigel Dewdney, Carol Van Ess- Dykema,. Richard MacMillan 

July 2001 Proceedings of the workshop on Human Language Technology and 
Knowledge Management - Volume 2001 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(64.60 KB) Additional Information: full citation , abstract, references 

Categorization of text in IR has traditionally focused on topic. As use of the Internet and 
e-mail increases, categorization has become a key area of research as users demand 
methods of prloritizihg documents. This work investigates text classification by format 
style, i.e. "genre", and demonstrates, by complementing topic classification, that it can 
significantly improve retrieval of information. The paper compares use of presentation 
features to word features, and the combination thereof, usin ... 
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Hans van Halteren, Walter Daelemans, Jakub Zavrel 
June 2001 Computational Linguistics, Volume 27 issue 2 
Publisher: MIT Press 

Full text available: ^ r>idx iS 
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We examine how differences in language models, learned by different data-driven 
systems performing the same NLP task, can be exploited to yield a higher accuracy than 
the best individual system. We do this by means of experiments involving the task of 
morphosyntactic word class tagging, on the basis of three different tagged corpora. Four 
well-known tagger generators (hidden Markov model, memory-based, transformation 
rules, and maximum entropy) are trained on the same corpus data. After comparis ... 

78 Presentations: Two statistical parsing models applied to the Chinese Treebank 

Daniel M. Bikel, David Chiang 

October 2000 Proceedings of the second workshop on Chinese language processing: 
held in conjunction with the 38th Annual Meeting of the Association for 
Computational Linguistics - Volume 12 

Publisher: Association for Computational Linguistics 

Full text available: ^ Ddf(525.60 KB) Additional Information: full citation, abstract , references, citings 

This paper presents the first-ever results of applying statistical parsing models to the 
newly-available Chinese Treebank. We have employed two models, one extracted and 
adapted from BBN's SIFT System (Miller et al., 1998) and a TAG-based parsing model, 
adapted from (Chiang, 2000). On sentences with ^40 words, the former model performs 
at 69% precision, 75% recall, and the latter at 77% precision and 78% recall. 

^9 Comparison between tagged corpora for the named entity task 
Chikashi Nobata, Nigel Collier, Jun'ichi Tsujii 

October 2000 Proceedings of the workshop on Comparing corpora - Volume 9 
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We present two measures for comparing corpora based on information theory statistics 
such as gain ratio as well as simple term-class frequency counts. We tested the 
predictions made by these measures about corpus difficulty in two domains — news and 
molecular biology — using the result of two well-used paradigms for NE, decision trees 
and HMMs and found that gain ratio was the more reliable predictor. 

80 Regular papers: Applying system combination to base noun phrase identification 
Erik F. Tjong Kim Sang, Walter Daelemans, Herve Dejean, Rob Koeling, Yuval Krymolowski, 
Vasin Punyakanok, Dan Roth 

July 2000 Proceedings of the 18th conference on Computational linguistics - Volume 

2 • 

Publisher: Association for Computational Linguistics 

Full text available: ^ pclf(678.60 KB) Additional Information: full citation , abstract, references , citings 

We use seven machine learning algorithms for one task: identifying base noun phrases. 
The results have been processed by different system combination methods and all of 
these outperformed the best individual result. We have applied the seven learners with 
the best combinator, a majority vote of the top five systems, to a standard data set and 
managed to improve the best published result for this data set. 
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81 Jb6 maxinfium entropy approach and probabilistic IR models 
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July 2000 ACM Transactions on Information Systems (TOIS), Volume 18 issue 3 
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^ Full text available: ■g pdf(246.45 KB) 



This paper takes a fresh look at modeling approaches to information retrieval that have 
been the basis of much of the probabilistically motivated IR research over the last 20 
years. We shall adopt a subjectlvist Bayeslan view of probabilities and argue that classical 
work on probabilistic retrieval is best understood from this perspective. The main focus of 
the paper will be the ranking formulas corresponding to the Binary Independence Model 
(BIM), presented originally by Roberston and Spar ... 

Keywords: idf weighting, binary independence model, combination match, linked 
dependence, probability ranking principle 



82 Web nnining research: a survey 
Raymond Kosala, Hendrik Blockeel 

June 2000 ACM SIGKDD Explorations Newsletter Volume 2 issue i 
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83 Assigning function tags to parsed text 
Don Blaheta, Eugene Charniak 

April 2000 Proceedings of the first conference on North American chapter of the 
Association for Computational Linguistics 

Publisher: Morgan Kaufmann Publishers Inc. 

Full text available: ^pdf(602.13 KB) Additional Information: full citation , abstract , references , citings 

It is generally recognized that the common nonterminal labels for syntactic constituents 
(NP, VP, etc.) do not exhaust the syntactic and semantic Information one would like about 
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parts of a syntactic tree. For example, the Penn Tree-bank gives each constituent zero or 
more 'function tags' indicating semantic roles and other related information not easily 
encapsulated In the simple constituent labels. We present a statistical algorithm for 
assigning these function tags that, on text already parse ... 

84 Information extraction: Information extraction research and applications: current 
progress and future directions 

Andrew Kehler, Jerry R. Hobbs, Douglas Appelt, John Bear, Matthew Caywood, David Israel, 
Megumi Kameyama, David Martin, Claire Monteleoni 

October 1998 Proceedings of a workshop on held at Baltimore, Maryland: October IS- 
IS, 1998 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(1.24 MB) Additional Infomriation: full citation , abstract , references 

Analysts face a daunting task: they must accurately analyze, categorize, and assimilate a 
large body of information from a variety of sources and for a variety of domains of 
interest. The complexity of the task necessitates a variety of information access and 
extraction tools which technology up to this point has not been able to provide. SRfs 
TIPSTER Phase III project has focused on two major obstacles to the development of such 
tools: inadequate degrees of accuracy and portability. We begin b ... 

85 Feature lattices for maximum entropy modelling 

Andrei Mikheev 

August 1998 Proceedings of the 17th international conference on Computational 
linguistics - Volume 2 , Proceedings of the 36th annual meeting on 
Association for Computational Linguistics - Volume 2 

Publisher: Association for Computational Linguistics 

Full text available: ^ Ddf(664.80 KB) Additional Information: full citation, abstract, references, dtinas 

Maximum entropy framework proved to be expressive and powerful for the statistical 
language modelling, but it suffers from the computational expensiveness of the model 
building. The iterative scaling algorithm that is used for the parameter estimation is 
computationally expensive while the feature selection process might require to estimate 
parameters for many candidate features many times. In this paper we present a novel 
approach for building maximum entropy models. Our approach uses the featu ... 

86 Memory-based learning: using similarity for smoothing 
Jakub Zavrel, Walter Daelemans 

July 1997 Proceedings of the eighth conference on European chapter of the 

Association for Computational Linguistics , Proceedings of the 35th annual 
meeting on Association for Computational Linguistics 

Publisher: Association for Computational Linguistics 

Full text available: ' PDdf(764.37 KB) 

S ^ Additional Information: full citation , abstract , references , citings 

W Publisher Site 

This paper analyses the relation between the use of similarity in Memory-Based Learning 
and the notion of backed-off smoothing in statistical language modeling. We show that 
the two approaches are closely related, and we argue that feature weighting methods in 
the Memory-Based paradigm can offer the advantage of automatically specifying a 
suitable domain-specific hierarchy between most specific and most general conditioning 
information without the need for a large number of parameters. We report ... 

87 Independence assumptions considered harmful 
Alexander Franz 

July 1997 Proceedings of the eighth conference on European chapter of the 

Association for Computational Linguistics , Proceedings of the 35th annual 
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Many current approaches to statistical language modeling rely on independence 
assumptions between the different explanatory variables. This results In models which are 
computationally simple, but which only model the main effects of the explanatory 
variables on the response variable. This paper presents an argument in favor of a 
statistical approach that also models the interactions between the explanatory variables. 
The argument rests on empirical evidence from two series of experimetns concern ... 

88 A new statistical parser based on big ram lexical dependencies 

Michael John Collins 

June 1996 Proceedings of the 34th annual meeting on Association for Computational 
Linguistics 

Publisher: Association for Computational Linguistics 

pdf(737.31 KB) 
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This paper describes a new statistical parser which is based on probabilities of 
dependencies between head-words in the parse tree. Standard bigram probability 
estinriation techniques are extended to calculate probabilities of dependencies between 
pairs of words. Tests using Wall Street Journal data show that the method performs at 
least as well as SPATTER (Magerman 95; Jelinek et al. 94), which has the best published 
results for a statistical parser on this task. The simplicity of the approach me ... 

89 A maximum entropy approach to natural language processing 
Adam L. Berger, Vincent J. Delia Pietra, Stephen A. Delia Pietra 
l\/larch 1996 Computational Linguistics, volume 22 issue i 
Publisher: MIT Press 

Full text available: ^ . 
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The concept of maxinaum entropy can be traced back along multiple threads to Biblical 
times. Only recently, however, have computers become powerful enough to permit the 
widescale application of this concept to real world problems in statistical estimation and 
pattern recognition. In this paper, we describe a method for statistical modeling based on 
maximum entropy. We present a maximum-likelihood approach for automatically 
constructing maximum entropy models and describe how to implement this app ... 

90 Parallel text search methods 
Gerard Salton, Chris Buckley 

February 1988 Communications of the ACM, volume 31 issue 2 
Publisher: ACM Press 

Full text available* HilDdfd 63 MB) Additional Information: full citation , abstract , references, citings , index 
^ terms , review 

A comparison of recently proposed parallel text search methods to alternative available 
search strategies that use serial processing machines suggests parallel methods do not 
provide large-scale gains in either retrieval effectiveness or efficiency. 
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